game history
General Modular Harness for LLM Agents in Multi-Turn Gaming Environments
Zhang, Yuxuan, Yu, Haoyang, Hu, Lanxiang, Jin, Haojian, Zhang, Hao
We introduce a modular harness design for LLM agents that composes of perception, memory, and reasoning components, enabling a single LLM or VLM backbone to tackle a wide spectrum of multi turn gaming environments without domain-specific engineering. Using classic and modern game suites as low-barrier, high-diversity testbeds, our framework provides a unified workflow for analyzing how each module affects performance across dynamic interactive settings. Extensive experiments demonstrate that the harness lifts gameplay performance consistently over un-harnessed baselines and reveals distinct contribution patterns, for example, memory dominates in long-horizon puzzles while perception is critical in vision noisy arcades. These findings highlight the effectiveness of our modular harness design in advancing general-purpose agent, given the familiarity and ubiquity of games in everyday human experience.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- North America > Canada (0.04)
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
Zhang, LeCheng, Wang, Yuanshi, Shen, Haotian, Wang, Xujie
The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical context, several Large Language Model (LLM) agents (including Gemini, DeepSeek, and GPT variants) guided by structured prompts, and an agent based on Proximal Policy Optimization (PPO) employing a Transformer encoder for comprehensive game history processing. Performance is benchmarked against the baseline, with the PPO-based agent demonstrating superior win rates ($58.5\% \pm 1.0\%$), significantly outperforming the LLM counterparts. Our analysis highlights the strengths of deep reinforcement learning in policy refinement for complex deductive tasks, particularly in learning implicit strategies from self-play. We also examine the capabilities and inherent limitations of current LLMs in maintaining strict logical consistency and strategic depth over extended gameplay, despite sophisticated prompting. This study contributes to the broader understanding of AI in recreational games involving hidden information and multi-step logical reasoning, offering insights into effective agent design and the comparative advantages of different AI approaches.
Scans for the memories: why old games magazines are a vital source of cultural history – and nostalgia
Before the internet, if you were an avid gamer then you were very likely to be an avid reader of games magazines. From the early 1980s, the likes of Crash, Mega, PC Gamer and the Official PlayStation Magazine were your connection with the industry, providing news, reviews and interviews as well as lively letters pages that fostered a sense of community. Very rarely, however, did anyone keep hold of their magazine collections. Lacking the cultural gravitas of music or movie publications, they were mostly thrown away. While working at Future Publishing as a games journalist in the 1990s, I watched many times as hundreds of old issues of SuperPlay, Edge and GamesMaster were tipped into skips for pulping.
- Information Technology > Communications (0.71)
- Information Technology > Artificial Intelligence > Games (0.60)
GameArena: Evaluating LLM Reasoning through Live Computer Games
Hu, Lanxiang, Li, Qiyu, Xie, Anze, Jiang, Nan, Stoica, Ion, Jin, Haojian, Zhang, Hao
Evaluating the reasoning abilities of large language models (LLMs) is challenging. Existing benchmarks often depend on static datasets, which are vulnerable to data contamination and may get saturated over time, or on binary live human feedback that conflates reasoning with other abilities. As the most prominent dynamic benchmark, Chatbot Arena evaluates open-ended questions in real-world settings, but lacks the granularity in assessing specific reasoning capabilities. We introduce GameArena, a dynamic benchmark designed to evaluate LLM reasoning capabilities through interactive gameplay with humans. GameArena consists of three games designed to test specific reasoning capabilities (e.g., deductive and inductive reasoning), while keeping participants entertained and engaged. We analyze the gaming data retrospectively to uncover the underlying reasoning processes of LLMs and measure their fine-grained reasoning capabilities. We collect over 2000 game sessions and provide detailed assessments of various reasoning capabilities for five state-of-the-art LLMs. Our user study with 100 participants suggests that GameArena improves user engagement compared to Chatbot Arena. For the first time, GameArena enables the collection of step-by-step LLM reasoning data in the wild.
- Oceania > New Zealand (0.05)
- Oceania > Samoa (0.04)
- Oceania > Australia > Tasmania (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
- Transportation (1.00)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- (4 more...)
XQSV: A Structurally Variable Network to Imitate Human Play in Xiangqi
In this paper, we introduce an innovative deep learning architecture, termed Xiangqi Structurally Variable (XQSV), designed to emulate the behavioral patterns of human players in Xiangqi, or Chinese Chess. The unique attribute of XQSV is its capacity to alter its structural configuration dynamically, optimizing performance for the task based on the particular subset of data on which it is trained. We have incorporated several design improvements to significantly enhance the network's predictive accuracy, including a local illegal move filter, an Elo range partitioning, a sequential one-dimensional input, and a simulation of imperfect memory capacity. Empirical evaluations reveal that XQSV attains a predictive accuracy of approximately 40%, with its performance peaking within the trained Elo range. This indicates the model's success in mimicking the play behavior of individuals within that specific range. A three-terminal Turing Test was employed to demonstrate that the XQSV model imitates human behavior more accurately than conventional Xiangqi engines, rendering it indistinguishable from actual human opponents. Given the inherent nondeterminism in human gameplay, we propose two supplementary relaxed evaluation metrics. To our knowledge, XQSV represents the first model to mimic Xiangqi players.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China (0.04)
- Leisure & Entertainment > Games > Chess (0.72)
- Leisure & Entertainment > Games > Go (0.47)
Pushing Buttons: What the biggest deal in games history means for Call of Duty, Overwatch and more
Last week, Microsoft completed its $69bn purchase of Activision Blizzard, sealing a deal that many called the biggest in video game history (although they are overlooking the 1965 merger of Nihon Goraku Bussan and Rosen Enterprises to form the glorious Sega Enterprises, but let's not get into that). Microsoft was keen to slightly downplay the significance of the moment in its own press release, pointing out that it will become only, "the world's third-largest [emphasis my own] gaming company by revenue, behind Tencent and Sony". However, we all understand the awesome power it now wields, with Call of Duty, World of Warcraft, Overwatch and Candy Crush Saga under its command. How will this affect us, the gamers? Not much to begin with.
Why do video games matter? 20 books every player should read
At this stage in the pandemic, you may have started to question the amount of time you're spending playing video games. Publishers have reported huge increases in the numbers of players on titles such as Call of Duty Warzone and Fifa 21, while Animal Crossing, launched in the first weeks of last year's lockdown, has sold more than 30m copies, mostly on its seductive promise to bring friends together for tea parties on cute little islands. Perhaps now, however, you want to spend some time away from games – but without abandoning them. Or maybe you want to find out why Assassin's Creed Valhalla has such an unassailable grip on your attention. Either way, here are 20 books that tell us more about games, or are likely to be interesting to people who play them a lot.
- North America > United States > New York (0.05)
- Asia > Middle East > Jordan (0.05)